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ABSTRACT 


...  In  an  errors-in-variables  regression  model,  the  least  squares  estimate  is 
generally  inconsistent  for  the  complete  regression  parameter  but  can  be 

fhZ. 

consistent  for  certain  linear  combinations  of  this  parameter.  -We-  explore  the 
conjecture  that,  when  least  squares  is  consistent  for  a  linear  combination  of 
the  regression  parameter,  it  will  be  preferred  to  an  errors-in-variables 
estimate,  at  least  asymptotically.  The  conjecture  is  false,  in  general,  but 
it  is  true  for  important  classes  of  problems.  Gne  such  problem  is  a 
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randomized  two-group  analysis  of  covariance,  upon  which -we- focus 
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SIGNIFICANCE  AND  EXPLANATION 


In  an  error s-in-variables  regression  model,  the  least  squares  estimate  is 
generally  inconsistent  for  the  complete  regression  parameter  but  can  be 
consistent  for  certain  linear  combinations  of  this  parameter.  He  explore  the 
conjecture  that,  when  least  squares  is  consistent  for  a  linear  combination  of 
the  regression  parameter,  it  will  be  preferred  to  an  error s-in-variables 
estimate,  at  least  asymptotically.  The  conjecture  is  false,  in  general,  but 
it  is  true  for  important  classes  of  problems.  One  such  problem  is  a 
randomized  two-group  analysis  of  covariance,  upon  which  we  focus. 


COMPARISONS  OF  LEAST  SQUARES  AND  ERRORS-IN -VARIABLES  REGRESSION 
WITH  SPECIAL  REFERENCE  TO  RANDOMIZED  ANALYSIS  OF  COVARIANCE 


Raymond  J. Carroll *,  Paul  Gallo2'**  and  Laon  Jay  Glaser2 


1.  Introduction 

The  literature  on  the  problem  of  linear  regression  when  some  of  the  predictors  are 
measured  with  error  is  substantial,  see  for  example,  Reilly  and  Patino-Leal  (1981). 

Recent  work  includes  the  theoretical  study  of  Gleser  (1981)  and  the  important  practical 
shrinkage  suggestions  of  Fuller  (1980).  See  also  Anderson  (1984)  and  Healy  (1980). 

A  subarea  of  this  literature  concerns  two-group  analysis  of  covariance  when  some  of 
the  predictors  are  measured  with  error,  see  for  example  Lord  (1960),  Cochran  (1968), 
DeGracie  and  Fuller  (1972)  and  Cronbach  (1976). 

Lord  (1960)  discusses  the  case  of  one  covariate  measured  with  error.  He  notes  that 
it  may  "happen  ...  that  the  usual  covariance  analysis  (least  squares)  will  fail  to  detect 
a  statistically  significant  difference  between  groups  ...  when  such  a  difference  actually 
exists  and  can  be  detected  by  proper  statistical  procedures."  He  also  gives  a  numerical 
example  of  this  phenomenon. 

Cochran  (1968)  and  DeGracie  and  Fuller  (1972)  discuss  two  group  analysis  of 
covariance,  providing  in  particular  soaw  discussion  of  the  case  that  the  true  values  of 
the  covariates  are  themselves  random  variables*  this  is  usually  called  a  "structural"  model 
in  the  literature.  They  show  that  if  the  covariables  are  unbalanced  as  might  happen  in  an 
observational  study,  then  the  measurement  error  will  cause  least  squares  to  inconsistently 
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estimate  the  true  treatment  difference.  In  the  sense  of  asymptotics!  when  the  covariables 
are  unbalanced  one  should  then  correct  for  measurement  error  if  it  is  substantial;  a 
global  small  sample  statement  of  this  type  cannot  be  made. 

Now  consider  a  completely  randomized  study,  where  the  covariables  will  be  balanced  on 
average  across  the  two  treatments.  In  this  case,  Cochran  (1980)  and  DeGracie  and  Fuller 
(1980)  indicate  that  least  squares  will  consistently  estimate  the  treatment  difference. 

The  question  which  remains  to  be  answered  is  "Should  we  correct  for  measurement  error  when 
the  least  squares  estimate  consistently  estimates  the  treatment  effect?"  It  is  the 
purpose  of  this  note  to  partially  answer  this  question.  Using  large  sas^le  distribution 
theory,  we  show  that  in  a  balanced,  completely  randomized  study  with  measurement  error  in 
the  covariables,  the  least  squares  estimate  of  the  treatment  difference  will  be  generally 
preferred  when  compared  to  a  particular  errors-in-variables  regression  estimator.  It 
turns  out  that  this  result  can  be  generalized,  so  that  in  a  large  class  of  problems,  when 
least  squares  is  consistent  for  a  linear  combination  of  the  regression  parameter,  it  will 
be  preferred,  at  least  asymptotically.  Further,  for  a  smaller  but  not  insubstantial  class 
of  problems,  when  least  squares  is  consistent  for  a  linear  combination  of  the  regression 
parameter,  it  is  the  maximum  likelihood  estimate  of  this  linear  combination,  taking  the 
consistency  into  account. 


2.  The  Normal  Cage  with  no  Replication;  Technical  Background 


A  special  case  of  considerable  interest  occurs  when  all  errors  are  normally 
distributed  and  no  replicates  of  the  variables  measured  with  error  are  available.  The 
general  model  considered  here,  which  includes  the  analysis  of  covariance  as  a  special 
case, is  given  by 

Y  -  x,8 ,  +  x202  +  e  , 

C  -  X2  ♦  0  (2.1) 

0  -  [»>’]*  • 

Here,  Y  and  e  are  (N  *  1 )  vectors,  Xj  is  an  (N  *  p)  matrix  observed  without  error 

and  X2  is  an  (N  »  q)  matrix  of  true  values  which  we  cannot  observe  exactly.  Rather,  we 

observe  C.  The  rows  of  the  matrix  (U,c)  will  be  assumed  to  be  jointly  normally 

distributed  with  mean  zero  and  unknown  covariance  $  . 

In  comparing  least  squares  and  errors-in-variables  methods,  we  must  pick  a 

representative  member  of  the  latter  class.  In  the  main,  we  will  do  this  by  following 

Gleser  (1981)  for  the  case  that  no  replicated  estimates  of  X2  are  available*  the 

replicated  case  will  be  discussed  at  the  end  of  the  article.  CleBer  studies  the 

functional  model  in  which  X.,,  X2  are  considered  as  fixed  constants.  A  special  case  of 

2 

his  model  assumes  that  there  is  a  known  matrix  and  an  unknown  constant  ®  for  which 


If  tu  is  the  covariance  matrix  of  the  rows  of  U,  then  in  (2.2)  we  are  assuming  that  we 
know  the  ratio  of  the  elements  of  $u  to  ,  the  variance  of  the  elements  of  e  . 

Gallo  (1982)  exhibits  the  maximum  likelihood  estimate  of  0,  which  is  given  in 
Appendix  1 . 
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He  also  proves  the  following: 


Theorem  1  (From  Gallo  (1982)).  Suppose  that 

a  -  lim  n"1(x1,x2)t(x1,x2) 

N** 

exists  and  is  positive  definite.  Then  if  8^  is  the  functional  maximum  likelihood 

1/_  * 

estimate,  N  2 (0-6)  is  asymptotically  normally  distributed  with  zero  mean  and 
covariance 

Cov(0M)  -  d{a_1  +  a-1  a-1}  ,  where 

d  -  [0^-l]<[8^.-l]T 

q"1  -  [i.*2\r'[i'*2]r  ■ 


3.  Analysis  of  Covariance 

Consider  a  completely  randomised  two  group  analysis  of  covariance,  with  covariables 
subject  to  error.  Formally,  this  problem  can  be  subsumed  into  the  more  general  ■'-•‘ucture 
(2.1)  bv  letting  x2  be  the  covariables  and 


We  will  let  the  s^  represent  treatment  assignment,  standardized  to  have  mean  zero  and 
variance  one.  Specifically, 

s^  •  —  { ( 1— it  >/w}  ^  with  probability  ir  , 

■  {*/(  1-f )}  ^  with  probability  (1-it)  , 

where  «  is  the  probability  of  assignment  into  treatment  #1.  The  treatment  difference  is 
then  a/{ ( 1-» )*}  ^2,  w#  shell  treat  the  true  covariables  as  if  they  were  random  variables 
independent  of  treatment  assignment  and  with  covariance  matrix  $x.  in  order  to 
facilitate  discussion  we  do  not  write  down  detailed  assumptions;  rather,  we  will  apply 
Theorem  1  formally,  while  we  will  as sums  appropriate  conditions  to  compute  the  limiting 
distribution  of  least  squares.  A  more  general  result  is  given  in  Section  5. 

The  following  result  shows  that  as  long  as  treatment  assignment  is  random, 
asymptotically  least  squares  is  the  better  estimate  of  the  treatment  effect  a,  because 
both  estimates  are  asymptotically  normal  with  the  same  mean  and  least  squares  has  the 
smaller  variance. 

Theorem  2  The  least  squares  estimate  aL  is  asymptotically  normally  distributed  with 

2 

mean  a  and  variance  o  (L)/N,  where 

o2(d  -  c2  +  s^ub2  -  s^u<$x  -  iur  V2  *  (3-2> 

The  functional  estimate  <>M  has  the  same  asymptotic  mean  but  has  asymptotic  variance 
o2(m)/n,  where 

o2(M)  -  o2  +  ®2^u®2  ’  0*3) 
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It  is  reasonable  to  conjecture  that  complete  randomization  is  not  necessary  for 
Theorem  2.  For  example,  one  might  randomize  in  blocks  or  use  alternative  balancing 
schemes,  see  Wei  (1978) •  This  conjecture  is  worth  further  study,  and  might  be  facilitated 
by  use  of  equation  (A.  7)  in  the  appendix. 

It  should  be  noted  that  in  a  balanced  randomized  study,  the  usual  t-test  for 
treatment  effect  has  correct  nominal  level  asymptotically.  Thus,  from  both  an  estimation 
and  inferential  standpoint,  for  large  samples  least  squares  will  be  preferred  over  the 
functional  estimate. 

The  folklore  of  the  area  indicates  that,  asymptotically,  least  squares  estimates  are 
biased  but  generally  less  variable  than  errors-in-variables  estimates.  The  situation  that 
has  been  considered  in  this  section  is  one  in  which  the  least  squares  estimate  of 
treatment  effect  has  no  asymptotic  bias,  so  that  it  was  reasonable  to  conjecture  a 
preference  for  least  squares.  We  shall  show  in  Section  5,  however,  that  it  is  not  true 
that  consistency  of  least  squares  for  a  linear  combination  of  B  always  means  asymptotic 
preferability  of  least  squares,  although  it  is  true  for  a  large  class  of  problems. 


4.  Some  Extensions 


In  some  instances  an  assumption  such  as  (2.2)  will  not  be  tenable  so  that  a 

functional  estimate  cannot  be  computed.  There  are  many  ways  out  of  this  dilemma.  One  is 

to  take  independent  replicates  C1#  C2  of  X2  in  (2.1>.  One  can  compute  the  normal 

theory  functional  estimate  in  this  case  and  obtain  a  result  similar  to  Theorem  2,  but  more 

general  in  the  sense  that  the  underlying  random  variables  need  not  actually  be  normally 

distributed.  The  computation  of  this  functional  estimate  and  its  asymptotic  distribution 

theory  are  available  in,  for  example,  Gallo  (1982). 

There  are  instances  other  than  randomized  two-group  analysis  of  covariance  in  which 

certain  linear  combinations  of  the  least  squares  estimr.te  are  consistent  for  the  same 

T  T  T 

linear  combinations  of  the  parameter.  Consider  the  model  (2.1)  with  8  “  ( B  1 , 6 2 )  in 

which  it  is  desired  to  estimate  the  parameter  YT8,  where  rT  “  *Y^»Y2)>  Partitioning 
A  in  (2.3)  into  components  A^  . ,  informally  the  least  squares  estimate  satisfies 


K~  (<x1'c)T<x1'c>)"1(x1'c)Ty 


This  leads  us  to  a  result  which  is  proved  formally  by  Gallo  (1982): 


T“  T 

Theorem  3  The  least  squares  estimate  Y  8.  is  consistent  for  Y  8,  i.e.,  converges  in 

4-1 

probability  to  YTB  for  all  0>°2»$u«  if  and  only  if 


YTa” 1 A 
Mil  12 


To  see  the  relevance  of  Theorem  3,  consider  once  again  the  two  group  analysis  of 
covariance  of  Section  3.  Here  we  have 


Y2  -  °,  Y^  -  (°,D,  An  =  Identity, 
A*2  -  (plim  t)_1e^X2,  plim  n'^Xj)  , 
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where 


*1  *  <*,  V”V*  e«  "  (1  1”*1)  • 

Theorem  3  says  that  the  least  squares  estimate  of  the  parameter  a  will  be  consistent  for 
a  only  when 

N_1sJx2  — £— -*  0  .  (4.3) 

Note  that  (4.3)  is  simply  the  requirement  that  the  covariables  be  mean  balanced  across  the 
two  treatments.  Theorem  3  indicates  that  only  when  we  have  such  balance  will  the  least 
squares  treatment  effect  estimate  be  consistent. 
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5.  Further  Comparisons  of  Least  Squares  and  Maximum  Likelihood 

On  the  basis  of  the  previous  discussion  one  might  reasonably  conjecture  that  when 

T 

least  squares  is  consistent  for  Y  6  then  asymptotically  it  must  be  better  than  the 

T 

functional  estimate  of  Y  B •  in  model  (2.1),  our  special  cases  such  as  analysis  of 
covariance  have  relied  upon  a  degree  of  orthogonality  between  x2  and  the  non-intercept 
components  of  x.j •  Specifically,  for  least  squares  one  must  deal  with  the  following. 


N  '2  (y^X^T’x^-Yj)  » 

which  is  a  term  in  the  linear  expansion,  see  (A.2).  For  example,  suppose  X1 
are  very  strongly  orthogonal  in  the  sense  that 


(5.1) 


and  x. 


N  /2  {y'J<x'Jxi>_1X^X2} 


1„T„ 


(5.2) 


Since  x2  is  unknown  (5.2)  can  never  be  verified  and  in  fact  fails  in  a  randomized 
analysis  of  covariance  with  y -  0,  y1  "  (0  1).  However,  (5.2)  does  imply  (4.2)  and 

consistency  of  least  squares.  It  is  fairly  easy  to  show  that  if  (5.2)  holds,  then  the 

T 

least  squares  estimate  of  Y  3  can  be  no  worse  than  the  functional  estimate,  at  least 
asymptotically. 

Further  investigation  of  the  conjecture  is  rather  technical.  The  conjecture  is  false 

for  the  functional  case,  in  general,  consider  an  analysis  of  covariance  in  which  the 

treatment  assignments  { s^ }  occurs  in  the  fixed  sequence  {-1 ,+  1 ,-1 ,  +  1 , • • •} •  Let  the 

covariables  {x^j  be  fixed,  in  a  variety  of  circumstances,  it  can  be  shown  tht  the  least 

squares  estimate  a  of  the  treatment  effect  a  in  model  (3.1)  satisfies 
L 


' 2  (a  -a) 

L 


V  ♦  A2N 


N 

:  l 

i-1 


Vi 


(5.3) 


where  A1  and  A2  are  constants  and  V  is  a  weighted  sum  of  independent  observations 
not  depending  on  js^^x^}.  Equation  (5.3)  shows  that  asymptotic  normality  with  mean  zero 


of  the  least  sauares  estimate  when  centered  at  the  treatment  effect  a  requires  that 


either  converge  in  probability  to  zero  or  that  (5.4)  be  itBelf  asymptotically  normally 


distributed.  For  the  functional  model,  the  latter  case  is  not  possible  while  the  former 
case  is  (5.2).  Since  (5.4)  can  diverge  as  N  ♦  •  with  (4.2)  still  holding,  for  the 
functional  case  this  means  that  least  squares  will  not  be  always  better  asymptotically 
than  maximum  likelihood  when  least  squares  is  consistent. 

Now  consider  the  structural  case  in  which  the  rows  of  matrix  (X1,X2)  are  independent 
and  identically  distributed.  The  first  column  of  X^  is  a  column  of  ones  and  X^  is 

observed  exactly,  while  x2  is  observed  with  error  as  in  model  (2.1).  Suppose  we  are 

T 

interested  in  estimating  a  linear  combination  Y  8  for  which  least  squares  is  known  to  be 
consistent,  i.e.,  (4.2)  holds. 


Theorem  4  Make  the  following  assumption: 

Given  X-j,  the  rows  cf  R  =  X2  -  X ^ A  11^12  are  indeP*ndent 
and  identically  distributed  with  mean  zero  and  covariance 


22.1 


A  -  A  A-1 A 
22  21  11  12 


Further,  suppose  that  R  is  distributed  independently  of  e  and  U. 


(5.5) 


If  we  define 


A  *  ‘*22.1  *  V’  ' 

we  have  that  the  least  squares  and  functional  maximum  likelihood  estimates  are 

2  2 

asymptotically  normally  distributed  with  mean  zero  and  variances  a  (L)/N,  a  (M)/N 
2  2 

respectively,  where  <J  (L)  <  a  (M).  in  fact 

a2a,  -  c2,m,  -  ir^lv,)^ 

o2(M)  -  (y/a^Ho2  ♦  8^uB2>  . 


The  proof  of  Theorem  4  is  given  in  the  Appendix  2.  Note  that  it  includes  Theorem  2  as  a 
special  case  because  when  x1  is  distributed  independently  of  x2,  then  (5.5)  holds. 
That  Theorem  4  may  not  hold  when  assumption  (5.5)  is  violated  is  sketched  in  Appendix  3. 

It  may  be  considered  a  bit  unfair  to  compare  least  squares  to  a  "maximum  likelihood 
estimator"  which  does  not  take  into  account  the  consistency  of  least  squares.  It  turns 


out  that,  under  normality  assumptions,  the  maximum  likelihood  estimate  of  Y  i  when  it  is 
known  that  least  squares  is  consistent  for  YTB  is  simply  the  least  squares  estimate  of 
yT(J.  Specifically,  we  have  the  following. 

Theorem  5  Suppose  that,  given  X1 ,  (5.5)  holds  and  the  rows  of  R  “  Xj  -  are 

normally  distributed  independently  of  e  and  U.  Then  the  maximum  likelihood  estimate  of 

T 

yTB  given  X,  and  subject  to  (4.2)  is  simply  the  least  squares  estimate  of  Y  B. 
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6.  Conclusion 

In  a  particular  error s-in-variables  regression  model,  we  have  shown  that  least 
squares  will  often  be  asymptotically  more  efficient  than  a  particular  functional 
regression  estimate,  when  the  former  is  known  to  be  consistent.  This  happens  in 
particular  when  those  variables  X2  subject  to  error  are  distributed  independently  of 
those  variables  x^  measured  without  error,  or  more  generally  when  X2  follows  a  linear 
regression  in  X-j.  An  important  special  case  of  this  least  squares  preference  phenomenon 
is  a  randomized  analysis  of  covariance  where  One  wants  to  estimate  the  treatment  effect. 

Finally,  if  the  linear  regression  of  x2  on  x,  follows  a  multinormal  distribution,  and 

T 

if  it  is  known  that  least  squares  is  consistent  for  the  linear  combination  Y  8,  then 

T 

least  squares  is  the  maximum  likelihood  estimate  for  Y  8* 
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tndlx  It  The  maximum  likelihood  estimator  for  model  (2.1) 

Define 

T  •  1  T 

L  -  I  -  X^x^x,)  ’x* 
w  ■  [c  y]tl[c  Y)  . 

Let  8  be  the  smallest  eigenvalue  of  1W,  where  |Q  ia  given  in  (2.2). 
Define 


C,  -  IX,  C)  -  (X,  X2  +  U) , 


The  matrix  D  ia  non-singular  with  probability  one,  and  the  functional  estimate  ia 

8m  “  D-1C*Y  . 

The  calculation  of  8  is  derived  by  Gallo  (1982)  and  relies  on  similar  work  of  Gleser 

M 

(1981)  and  Healy  (1980). 


* 


H 

1 
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The  following  general  result  can  be  justified  formally  and  is  at  the  heart  of  the 

analysis  of  covariance  calculations.  We  sketch  herein  a  proof  without  stating  all  the 

T 

necessary  regularity  conditions.  Recall  e,  «  (1  1  ...  1). 

Lemma  A  Define 


I 

I 

I 

and  suppose  that  y 


A  =  A 

22.1  22 


A21A11A12  ' 


satisfies  (4.2)  as  well  as 
NJ/2X,T<R  ♦  U) 


(A 


22.1 


+  $  > 

Tu 


-1 


0  (1)  , 
P 


(A. 


where  R  »  X2  -  X^^A  .  Then  the  least  squares  estimate  satisfies 
V,  T  * 

N  /2r  <8t  -  B) 

ll 

-  NJ/2Y1TA”JxiT(e  -  U  B2)  (A. 

+  N~1/2ytTA*Jx)T  (R  +  0)e  +  Cp(1)  , 

where 


6 


A$u62  * 


Proof  (Sketch):  Define  C*  ■  [X,  X2  ♦  U) .  Then 

N  (fJL  *  6)  + 

Multiply  both  sides  of  (A. 3)  by  N  ^ fT(C^C#/N) 


to  get 


S  N 


by  Slutsky's  Theorem  the  first  term  on  the  right  hand  side  of  (JU4)  equals 


0)"  (* 


c.«e  -  «2)/»  +  ^ua2^r  °P(1) 


nJ/2yiaiixi<e  ‘  “V  +  °pn)  ' 


which  is  the  same  as  the  first  tern  on  the  right  hand  aide  of  (A.2).  The  second  term  in 


(A. 4)  is 


»1‘I^Y*(X*X1)"1X*(R  +  U)W  <u62  , 


w  _L_>  U22  ,  ♦  ^J’1  ,  (X^X/H)  - ►  A„  . 

By  (A.l),  this  completes  the  proof. 

One  should  note  that  (A.l)  is  satisfied  in  the  randomized  two  group  analysis  of 
covariance  of  Section  3. 

Using  Lemma  A  and  writing  for  the  analysis  of  covariance 
X2  “  (x21  x22  •**  X2N>  ' 

UT  •  (u,  Uj  ...  y  , 

-1  N 

m2  ”  N  I  X2i 

i-1 

we  see  that  for  the  discussion  in  Section  3, 

1  N 

N1/z  (aL-  a)  -  N  /2  £  sjtj^  ♦  (n  -  B 2 > TU t  -  »2> }  »  (A. 7) 

"  *  <A  +  V'\B2  ’  <*x  *  *ur\82  * 

The  expression  (A. 7)  shows  why  Theorem  2  may  apply  to  alternative  randomization  schemes. 

2 

Proof  of  Theorem  4  The  form  of  o  (M)  follows  directly  from  Theorem  1.  The  form  of 
2 

a  (L)  follow#  from  (A«2)  and  the  assumptions  of  the  Thsorsm. 


l  *  »  *  •  9*m  *%  m\  »*•  •* 


Proof  of  Theorem  5 


First  assume  that  A^A  is  'cnown<  Define 


*  ■  (I'  A11&12>e 


°2  +  eA62  -  eAA*u62  • 


Given  (X1(C),  we  have 


(yjx^c)  -  x.,x  +  sAA22<182  +  f  , 


(A. 8) 


where  S  »  C  -  X^A”^A  and  the  rows  of  F  are  independent  normal  random  variables  with 


mean  zero  and  variances  *  • 
If  we  define 


t  «  o"2A 
+x  22.1 


m22.ib2 


.-1 


2  2 

then  the  mapping  of  6 1 ,  02 ,  a  ,  A22  1  to  x,  S,  0  ,  L  is  one-to-one  from  the  space 
|o2  >  0,  A22  ^  >  0}  to  the  space  {o2  >  0,  L  -  ®2$uo  >  0}  .  One  next  shows  that  the  map 
«,  L,  o2  to  x,  ?,  L,  f2  is  also  one-to-one  onto  the  space  {f2  >  0,  L  >  0} . 

However,  the  maximum  likelihood  estimates  of  x  and  £  are  seen  from  (A. 8)  to  be 

lix^s)1  (X^S)}-1  (x^ ,s)ty  . 

Since  the  column  space  of  (X1  ,C)  is  the  sasie  as  the  column  space  of  (X1  ,S) ,  it  follows 
that,  given  (x^,  S,  A^A12),  the  maximum  likelihood  and  least  squares  estimates  of  x 
coincide,  i.e., 

x(mle)  -  (I,A”’ai2)0l  . 

(J*  ip 

This  means  that  Y  @L  is  the  maximum  likelihood  estimate  of  Y  *,  given  X^ ,S  and 

-1  T  T 

A11A12*  Since,  under  (4.2),  Y  *  ■  Y  t  ,  the  proof  is  complete. 
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»ndlx  3.  A  Count«ri»i»pl« 


If  the  rows  of  (Xj  ,*2)  ars  independent  and  identically  distributed  but  (S.S)  does 


not  hold,  it  is  possible  to  construct  a  counterexample  to  Theorem  4.  The  way  to  do  this 


is  to  consider  model  (3.1)  but  with  the  pairs  {(s^x^)}  satisfying 


Es  -  Es  -  0  , 


2  2 

Es  -  Ex 


Ex  “  fl1  *  0  ,  Esx  ■  flj  *  0  , 


V  -  . 


In  this  case,  the  expansion  (A. 2)  still  holds  and  the  last  term  in  this  expansion  is 


N  2j1<eiSi-$2)K-e1'92SJ« 


key  to  Theorem  4  is  that,  under  (S.5),  (x^  -  8 1  -  BjSj^  ♦  U1>  has  mean  zero  and 


variance  A  .  Without  assumption  (5.5),  one  can  see  that  while  (A. 9)  has  swan  zero,  its 


variance  can  depend  on  the  fourth  moment  of  {s^.  By  manipulating  this  fourth  moment 


appropriately.  Theorem  4  can  be  made  to  fail. 
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